skip to main content


Search for: All records

Creators/Authors contains: "Jun, Sang-Woo"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. null (Ed.)
  2. We present ColumnBurst, a memory-efficient, near-storage hardware accelerator for database join queries. While the paradigm of near-storage computation has demonstrated performance and efficiency benefits on many workloads by reducing data movement overhead, memory-bound operations such as relational joins on unsorted data have been relatively inefficient with fast modern storage devices, due to the limited capacity and performance of memory available on the near-storage processing engine. ColumnBurst delivers very high performance even on such complex queries, while staying within the memory performance and capacity budget of what is typically already available on off-the-shelf storage devices. ColumnBurst achieves this via a compact, hardware implementation of sorting-based group-by aggregation and join algorithms, instead of the conventional hash-based algorithms. We evaluate ColumnBurst using an FPGA-based prototype with 1 GB of slow on-device DDR3 DRAM, and show that on benchmarks including TPC-H queries with join queries on unsorted columns, it outperforms MonetDB on a 6-core i7 with 32 GB of DRAM by over 7x, and ColumnBurst using a near-storage hash join algorithm by 2x. 
    more » « less
  3. We present BurstZ, a bandwidth-efficient accelerator platform for scientific computing. While accelerators such as GPUs and FPGAs provide enormous computing capabilities, their effectiveness quickly deteriorates once the working set becomes larger than the on-board memory capacity, causing the performance to become bottlenecked either by the communication bandwidth between the host and the accelerator. Compression has not been very useful in solving this issue due to the difficulty of efficiently compressing floating point numbers, which scientific data often consists of. Most compression algorithms are either ineffective with floating point numbers, or has a high performance overhead. BurstZ is an FPGA-based accelerator platform which addresses the bandwidth issue via a novel hardware-optimized floating point compression algorithm, which we call sZFP. We demonstrate that BurstZ can completely remove the communication bottleneck for accelerators, using a 3D stencil-code accelerator implemented on a prototype BurstZ implementation. Evaluated against hand-optimized implementations of stencil code accelerators of the same architecture, our BurstZ prototype outperformed an accelerator without compression by almost 4X, and even an accelerator with enough memory for the entire dataset by over 2X. BurstZ improved communication efficiency so much, our prototype was even able to outperform the upper limit projected performance of an optimized stencil core with ideal memory access characteristics, by over 2X. 
    more » « less
  4. We describe GraFBoost, a flash-based architecture with hardware acceleration for external analytics of multi-terabyte graphs. We compare the performance of GraFBoost with 1 GB of DRAM against various state-of-the-art graph analytics software including FlashGraph, running on a 32-thread Xeon server with 128 GB of DRAM. We demonstrate that despite the relatively small amount of DRAM, GraFBoost achieves high performance with very large graphs no other system can handle, and rivals the performance of the fastest software platforms on sizes of graphs that existing platforms can handle. Unlike in-memory and semi-external systems, GraFBoost uses a constant amount of memory for all problems, and its performance decreases very slowly as graph sizes increase, allowing GraFBoost to scale to much larger problems than possible with existing systems while using much less resources on a single-node system. The key component of GraFBoost is the sort-reduce accelerator, which implements a novel method to sequentialize fine-grained random accesses to flash storage. The sort-reduce accelerator logs random update requests and then uses hardware-accelerated external sorting with interleaved reduction functions. GraFBoost also stores newly updated vertex values generated in each superstep of the algorithm lazily with the old vertex values to further reduce I/O traffic. We evaluate the performance of GraFBoost for PageRank, breadth-first search and betweenness centrality on our FPGA-based prototype (Xilinx VC707 with 1 GB DRAM and 1 TB flash) and compare it to other graph processing systems including a pure software implementation of GrapFBoost. 
    more » « less